Convolution-augmented Transformer for Speech Recognition